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Abstract 

Classical game theory treats players as 
special—a description of a game contains a 
full, explicit enumeration of all players—even 
though in the real world, “players” are no more 
fundamentally special than rocks or clouds. It 
isn’t trivial to find a decision-theoretic founda¬ 
tion for game theory in which an agent’s coplay¬ 
ers are a non-distinguished part of the agent’s 
environment. Attempts to model both players 
and the environment as Turing machines, for 
example, fail for standard diagonalization rea¬ 
sons. 

In this paper, we introduce a “reflective” type 
of oracle, which is able to answer questions 
about the outputs of oracle machines with ac¬ 
cess to the same oracle. These oracles avoid 
diagonalization by answering some queries ran¬ 
domly. We show that machines with access to 
a reflective oracle can be used to define ratio¬ 
nal agents using causal decision theory. These 
agents model their environment as a probabilis¬ 
tic oracle machine, which may contain other 
agents as a non-distinguished part. 

We show that if such agents interact, they will 
play a Nash equilibrium, with the randomiza¬ 
tion in mixed strategies coming from the ran¬ 
domization in the oracle’s answers. This can 
be seen as providing a foundation for classical 
game theory in which players aren’t special. 


1 Introduction 

Classical decision theory and game theory are founded 
on the notion of a perfect Bayesian reasoner [2]. Such 
an agent may be uncertain which of several possible 
worlds describes the state of its environment, but given 
any particular possible world, it is able to deduce ex¬ 
actly what outcome each of its available actions will 
produce [3]. This assumption is, of course, unrealis¬ 
tic [4, ^]: Agents in the real world must necessarily 
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be boundedly rational reasoners, which make decisions 
with finite computational resources. Nevertheless, the 
notion of a perfect Bayesian reasoner provides an ana¬ 
lytically tractable first approximation to the behavior 
of real-world agents, and underlies an enormous body 
of work in statistics [6], economics [7], computer sci¬ 
ence [8], and other fields. 

On closer examination, however, the assumption 
that agents can compute what outcome each of their 
actions leads to in every possible world is troublesome 
even if we assume that agents have unbounded comput¬ 
ing power. For example, consider the game of Matching 
Pennies, in which two players each choose between two 
actions (“heads” and “tails”); if the players choose the 
same action, the first player wins a dollar, if they choose 
differently, the second player wins. Suppose further 
that both players’ decision-making processes are Tur¬ 
ing machines with unlimited computing power. Finally, 
suppose that both players know the exact state of the 
universe at the time they begin deliberating about the 
actions they are going to choose, including the source 
code of their opponent’s decision-making algorithm.^ 

In this set-up, by assumption, both agents know ex¬ 
actly which possible world they are in. Suppose that 
they are able to use this information to accurately 
predict their opponent’s behavior. Since both play¬ 
ers’ decision-making processes are deterministic Tur¬ 
ing machines, their behavior is deterministic given the 
initial state of the world; each player either definitely 
plays “heads” or definitely plays “tails”. But neither of 
these possibilities is consistent: For example, if the first 
player chooses heads and the second player can predict 
this, the second player will choose tails, but if the first 
player can predict this in turn, it will choose tails, con¬ 
tradicting the assumption that it chooses heads. 

The problem is caused by the assumption that given 
its opponent’s source code, a player can figure out what 
action the opponent will choose. One might think that 
it could simply run its opponent’s source code, but if the 
opponent does the same, both programs will go into an 
infinite loop. Binmore [10], discussing the philosophical 

^The technique of quining (Kleene’s second recursion the¬ 
orem [9]) shows that it is possible to write two programs that 
have access to each other’s source code. 
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justification for game-theoretic concepts such as Nash 
equilibrium, puts this problem as follows: 

In any case, if Turing machines are used 
to model the players, it is possible to sup¬ 
pose that the play of a game is prehxed 
by an exchange of the players’ Godel num¬ 
bers. .. Within this framework, a perfectly 
rational machine onght presumably to be 
able to predict the behavior of the opposing 
machines perfectly, since it will be familiar 
with every detail of their design. And a uni¬ 
versal Turing machine can do this. What it 
cannot do is predict its opponents’ behavior 
perfectly and simultaneously participate in 
the action of the game. It is in this sense 
that the claim that perfect rationality is an 
unattainable ideal is to be understood. 

Even giving the players access to a halting oracle does 
not help, because even though a machine with access 
to a halting oracle can predict the behavior of an ordi¬ 
nary Turing machine, it cannot in general predict the 
behavior of another oracle machine. 

Classical game theory resolves this problem by al¬ 
lowing players to choose mixed strategies (probabil¬ 
ity distribntions over actions); for example, the unique 
Nash equilibrium of Matching Pennies is for each player 
to assign “heads” and “tails” probability 0.5 each. How¬ 
ever, instead of treating players’ decision-making algo¬ 
rithms as computable processes which are an ordinary 
part of a world with computable laws of physics, clas¬ 
sical game theory treats players as special objects. For 
example, to describe a problem in game-theoretic terms, 
we must provide an explicit list of all relevant players, 
even though in the real world, “players” are ordinary 
physical objects, not fundamentally distinct from ob¬ 
jects such as rocks or clouds. 

In this paper, we show that it is possible to define 
a certain kind of probabilistic oracle—that is, an ora¬ 
cle which answers some queries non-deterministically— 
such that a Turing machine with access to this oracle 
can perform perfect Bayesian reasoning about environ¬ 
ments that can themselves be described as oracle ma¬ 
chines with access to the same oracle. This makes it 
possible for players to treat opponents simply as an or¬ 
dinary part of this environment. 

When an environment contains multiple agents 
playing a game against each other, the probabilistic be¬ 
havior of the oracle may cause the players’ behavior to 
be probabilistic as well. We show that in this case, the 
players will always play a Nash equilibrium, and for ev¬ 
ery particular Nash equilibrium there is an oracle that 
causes the players to behave according to this equilib¬ 
rium. In this sense, our work can be seen as providing 
a foundation for classical game theory, demonstrating 
that the special treatment of players in the classical 
theory is not fundamental. 

The oracles we consider are not halting oracles; in¬ 
stead, roughly speaking, they allow oracle machines 


with access to such an oracle to determine the prob¬ 
ability distribution of outputs of other machines with 
access to the same oracle. Because of their ability to 
deal with self-reference, we refer to these oracles as re¬ 
flective oracles. 

2 Reflective Oracles 

In many situations, programs would like to predict 
the output of other programs. They could simulate 
the other program in order to do this. However, this 
method fails when there are cycles (e.g. program A is 
concerned with the output of program B which is con¬ 
cerned with the output of program A). Furthermore, 
if a procedure to determine the output of another pro¬ 
gram existed, then it would be possible to construct a 
liar’s paradox of the form “if I return 1, then return 0, 
otherwise return I”. 

These paradoxes can be resolved by using probabil¬ 
ities. Let M be the set of probabilistic oracle machines, 
defined here as Turing machines which can execute spe¬ 
cial instructions to (i) flip a coin that has an arbitrary 
rational probability of coming up heads, and to (ii) call 
an oracle O, whose behavior might itself be probabilis¬ 
tic. 

Roughly speaking, the oracle answers questions of 
the form: “Is the probability that machine M returns I 
greater than pi” Thus, O takes two inputs, a machine 
M € M and a rational probability p G [0,1] fl Q, and 
returns either 0 or 1. If M is guaranteed to halt and 
to output either 0 or I itself, we want 0{M,p) = I to 
mean that the probability that M returns 1 (when run 
with O) is at least p, and 0{M,p) = 0 to mean that it 
is at most p; if it is equal to p, both conditions are true, 
and the oracle may answer randomly. In summary, 

P(mO() = 1) > p ^ V{0{M,p) = l) = I 

P(M°() = 1) < p ^ P(C>(M,p)=0) = I 

where we write P(M‘^() = I) for the probability that M 
returns 1 when run with oracle O, and ¥{0{M,p) = I) 
for the probability that the oracle returns 1 on in¬ 
put {M,p). We assume that different calls to the or¬ 
acle are stochastically independent events (even if they 
are about the same pair (M, p)); hence, the behavior 
of an oracle O is fully specified by the probabilities 
P(0(M,p) = I). 

Definition A query (with respect to a particular ora¬ 
cle O) is a pair {M,p), where p G [0,1] n Q and M^Q is 
a probabilistic oracle machine which almost surely halts 
and returns an element of {0,1}. 

Definition An oracle is called reflective on R, where R 
is a set of queries, if it satisfies the two conditions dis¬ 
played above for every (M, p) G R. It is called reflective 
if it is reflective on the set of all queries. 
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Theorem 2.1. (i) There is a reflective oracle. 

(a) For any oracle O and every set of queries R, 
there is an oracle O' which is reflective on R and 
satisfies F{0'{M,p) = 1) = F{0{M,p) = 1) for all 
iM,p) i R. 

Proof. For the proof of (ii), see Appendix B; see also 
Theorem 5.1, which gives a more elementary proof of 
a special case. Part (i) follows from part (ii) by choos¬ 
ing R to be the set of all queries and letting O be arbi¬ 
trary. □ 

As an example, consider the machine given by 
M^if) = 1 — 0(M,0.5), which implements a version 
of the liar paradox by asking the oracle what it will 
return and then returning the opposite. By the exis¬ 
tence theorem, there is an oracle which is reflective on 
R = {(M,0.5)}. This is no contradiction: We can set 
P(O(M,0.5) = 1) = P(O(M,0.5) = 0) = 0.5, leading 
the program to output 1 half the time and 0 the other 
half of the time. 

3 Prom Reflective Oracles to Causal 
Decision Theory 

We now show how reflective oracles can be used to im¬ 
plement a perfect Bayesian reasoner. We assume that 
each possible environment that this agent might find 
itself in can likewise be modeled as an oracle machine; 
that is, we assume that the laws of physics are com¬ 
putable by a probabilistic Turing machine with access 
to the same reflective oracle as the agent. For exam¬ 
ple, we might imagine our agent as being embedded 
in a Turing-complete probabilistic cellular automaton, 
whose laws are specified in terms of the oracle. 

We assume that each of the agent’s hypothe¬ 
ses about which environments it finds itself in can 
be modeled by a (possibly probabilistic) “world pro¬ 
gram” H^{)^ which simulates this environment and re¬ 
turns a description of what happened. We can then 
define a machine W'^Q which samples a hypothesis FI 
according to the agent’s probability distribution and 
runs H^Q. In the sequel, we will talk about W^Q as if 
it refers to a particular environment, but this machine 
is assumed to incorporate subjective uncertainty about 
the laws of physics and the initial state of the world. 

We further assume that the agent’s decision-making 
process, A'^Q, can be modeled as a probabilistic oracle 
machine embedded in this environment. As a simple 
example, consider the world program 

WO()=!™ = 0 

\$15 otherwise 

In this world, the outcome is $20 (which in this case 
means the agent receives $20) if the agent chooses ac¬ 
tion 0 and $15 if the agent chooses action 1. 

Our task is to find an appropriate implementation 
of A®(). Here, we consider agents implementing causal 


decision theory (CDT) [Id], which evaluates actions ac¬ 
cording to the consequences they cause: For example, 
if the agent is a robot embedded in a cellular automa¬ 
ton, it might evaluate the expected utility of taking 
action 0 or 1 by simulating what would happen in the 
environment if the output signal of its decision-making 
component were replaced by either 0 or 1. 

We will assume that the agent’s model of the coun- 
terfactual consequences of taking different actions a is 
described by a machine W^(a), satisfying W'^Q = 
W^{A^{)) since in the real world, the agent takes ac¬ 
tion a = A'^l). In our example, 

_/$20 ifa = 0 
I $15 otherwise 


We assume that the agent has a utility function over 
outcomes, u{-), implemented as a lookup table, which 
takes rational values in [0, l].f_ Furthermore, we assume 
that both and W^{1) halt almost surely and 

return a value in the domain of u{-). Causal decision 
theory then prescribes choosing the action that maxi¬ 
mizes expected utility; in other words, we want to find 
an such that 

A^Q = argmax E [rt (IF®(a))] 

a 

In the case of ties, any action maximizing utility is al¬ 
lowed, and it is acceptable for A®() to randomize. 

We cannot compute this expectation by simply run¬ 
ning u(IF® (a)) many times to obtain samples, since the 
environment might contain other agents of the same 
type, potentially leading to infinite loops. However, we 
can find an optimal action by making use of a reflec¬ 
tive oracle. This is easiest when the agent has only two 
actions (0 and 1), but similar analysis extends to any 
number of actions. Define a machine 

^(IF®(l))-7r(IF®(0))-H 
2 


E^O := flip 


where flip(p) is a probabilistic function that returns 1 
with probability p and 0 with probability 1 — p. 

Theorem 3.1. O is reflective on {{E, 1/2)} if and only 
if A'^O := 0{E,1/2) returns a utility-maximizing ac¬ 
tion. 

Proof. The demand that A®() 
maxmizing action is equivalent to 

E[u(IF®(l))]>EKlF®(0))] 
E[u(IF®(l))]<EKW®(0))] 


return a utility- 

A®() = 1 
A®() = 0 


We have 

u{w^{i))-u{w^io)) + r 

2 

^Since the meaning of utility functions is invariant under 
affine transformations, the choice of the particular interval 
[0,1] is no restriction. 


P(£;®() = 1) = E 
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It is not difficult to check that E[m(W^(1))] ^ 

E[u(Vk®(0))] iff P(£’'^() = 1) ^ 1/2. Together with 
the definition of A^Q, we can use this to rewrite the 
above conditions as 

V{E^{) = 1) > 1/2 ^ 0{E,l/2) = 1 
¥{E^0 = 1) < 1/2 ^ 0(£;, 1/2) = 0 

But this is precisely the definition of “O is reflective on 

{{E,i/2)y\ □ 

In order to handle agents which can choose between 
more than two actions, we can compare action 0 to 
action 1, then compare action 2 to the best of actions 
0 and 1, then compare action 3 to the best of the first 
three actions, and so on. Adding more actions in this 
fashion does not substantially change the analysis. 

4 Prom Causal Decision Theory to 
Nash Equilibria 

Since we have taken care to define our agents’ world 
models (a) in such a way that they can em¬ 
bed other agents,^ we need not do anything spe¬ 
cial to pass from single-agent to multi-agent settings. 
As in the single-agent case, we model the environ¬ 
ment by a program W^{) that contains embedded 
agent programs Af,...,Ay and returns an outcome. 
We can make the dependency on the agent pro¬ 
gram explicit by writing W^() = Q,..., A^()) 

for some oracle machine This allows us 

to define machines Wy{ai) := A^j()) := 

F{Af (),..., Ap_i(), Oi, i(),.. ., A;/()), representing 
the causal effects of player i taking action a^. 

We assume that each agent has a utility func¬ 
tion Ui{-) of the same type as in the previous subsection. 
Hence, we can define the agent programs A^Q just as 
before: 

Af()=0(A„l/2) 

^oq ^ flip f^ uywyii))-uywym + ^ ^ 

Here, each E^Q calls Wf{), which calls A®() for each 
j 7 ^ i, which refers to the source code of A^(), but 
again, Kleene’s second recursion theorem shows that 
this kind of self-reference poses no theoretical prob¬ 
lem [9]. 

This setup very much resembles the setting of 
normal-form games. In fact: 

®More precisely, we have only required that (a) al¬ 
ways halt and produce a value in the domain of the utility 
function w(-). Since all our agents do is to perform a single 
oracle call, they always halt, making them safe to call from 
Wi’(a). 


Theorem 4.1. Given an oracle O, consider the 
n-player normal-form game in which the payoff of 
player i, given the pure strategy profile (ai,...,a„), 
is ... ,a„)h. The mixed strategy profile 

given by Si := P(Ap() = 1) is a Nash equilib¬ 

rium of this game if and only if O is reflective on 

{{El,1/2),..., {E^,1/2)}. 

Proof. For (si,...,s„) to be a Nash equilibrium is 
equivalent to every player’s mixed strategy being a best 
response; i.e., a pure strategy ai can only be assigned 
positive probability if it maximizes 

E[u,(FO(a„AO ()))] = EK(H/O(a0)] 

By an application of Theorem 3.1, this is equivalent to 
O being reflective on {(Ai, 1/2)}. □ 

Note that, in particular, any normal-form game with 
rational-valued payoffs can be represented in this way 
by simply choosing F^ to be the identity function. In 
this case, the theorem shows that every reflective oracle 
(which exists by Theorem 2.11 gives rise to a Nash equi¬ 
librium. In the other direction, Theorem 4.1 together 
with Theorem 2.H iil show that for any Nash equilib¬ 
rium (si,..., Sji) of the normal-form game, there is a 
reflective oracle such that P(Ap() = 1) = Si. 

5 From Nash Equilibria to Reflective 
Oracles 

In the previous section, we showed that a reflective or¬ 
acle can be used to find Nash equilibria in arbitrary 
normal-form games. It is interesting to note that we 
can also go in the other direction: For finite sets R satis¬ 
fying certain conditions, we can construct normal-form 
games Gn such that the existence of oracles reflective on 
R follows from the existence of Nash equilibria in Gr. 
This existence theorem is a special case of Theorem 2.1, 
but it not only provides a more elementary proof, but 
also provides a constructive way of finding such oracles 
(by applying any algorithm for finding Nash equilibria 
to Gr). 

Definition A set R of queries is closed if for every 
{M,p) e R and every oracle O, M^{) is guaranteed to 
only invoke the oracle on pairs {N, q) G R. It is bounded 
if there is some bound Br € N such that for every 
{M,p) £ R and every oracle O, M^{) is guaranteed to 
invoke the oracle at most Br times. 

Definition Given a finite set 

R = {{Mi,pi),...,{M„,p„)} and a 

vector X G [0,1]", define Os to be the oracle satisfying 
F{Os{Mi,pi) = 1) = Xi for i = l,...,n, and 

F{Os{M,p) = 1) = 0 for {M,p) i R. 

Theorem 5.1. For any finite, closed, bounded set R = 
{{Mi,pi),..., {Mn,Pn)}, there is a normal form game 
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Gr with m := n ■ (2 Br + 1) players, each of which has 
two pure strategies, such that for any Nash equilibrium 
strategy profile {si,..., Sm), the oracle Og with x := 
(si,..., Sn) is reflective on R. 

Proof. We divide the n • {2Br + 1) players in our game 
into three sets: the main players i = 1,. .., n, the copy 
players g{i,j) := j • n i, and the auxiliary players 
Hhj) ■= {BR+j)-n + i, ior i = I,... ,n, j = 1,. . .,Br. 

The mixed strategy Si of a main player i will de¬ 
termine the probability that Os{Mi,pi) = 1. We will 
force = Si, i.e., we will force the mixed strategy 

of each copy player to equal that of the correspond¬ 
ing main player; thus, the copy players will provide us 
with independent samples from the Bernoulli(si) distri¬ 
bution, allowing us to simulate up to Br independent 
calls to 0{Mi,pi). Finally, the auxiliary players are 
used to enforce the constraint Sg(i,g) = Sj, by having 
the copy player g{i,j) play a variant of Matching Pen¬ 
nies against the auxiliary player h{i,j). 

In order to define the game’s payoff function, note 
first that by writing out each possible way that the at 
most Br oracle calls of M^^{) might come out, we can 
write the probability that this machine returns 1 as a 
polynomial, 

K n 

k—l i' — l 

where di^k,i' < Br. We want to force the main player i 
to choose pure strategy 1 if this probability is strictly 
greater than pi, pure strategy 0 if it is strictly smaller. 

To do so, we set player i’s payoff function Ui{d) to 

Ef=i/*,fc(a), if a* = 1, 

Pi, otherwise 

where 

p _ rCqfc if ^g{i',j) — ^ i ^ 1 ^ J ^ di^ii^i> 

° otherwise 

Then, assuming we can guarantee Sg(ig) = Si, the 
expected payoff of strategy 1 to player i is exactly 
P(M^P®() = 1), while the payoff of strategy 0 is always 
Pi] hence, as desired, the Nash equilibrium conditions 
force i to choose 1 if the probability is greater than pi, 

0 if it is smaller. 

It remains to choose the payoffs 
{ug(^ij){a) , Uh{ij){3)) of the copy and auxiliary players. 

In order to force Sg(i,g) = Si, we set these payoffs as 
follows: 


di — 0 



^h(i,j) — 1 


(1,0) 

(oTo) 



(Ml 


Oi = l 



^h(i,j) — 1 

~ 0 


(M) 


(M^ 

(M^ 


We show in Appendix A that at Nash equilibrium, 
these payoffs force Sg(ij) = Sj. □ 

Theorem 5.1 is a special case of Theorem 2.1(i). The 
proof can be adapted to also show an analog of Theo¬ 
rem .2d^(ii), but we omit the details here. 

6 Related Work 

Joyce and Gibbard [12] describe one justification for 
mixed Nash equilibria in terms of causal decision theory. 
Specifically, they discuss a self-ratification condition 
that extends CDT to cases when one’s action is evidence 
of different underlying conditions that might change 
which actions are rational. An action self-ratifies if and 
only if it causally maximizes expected utility in a world 
model that has been updated on the evidence that this 
action is taken. 

For example, consider the setting of a matching pen¬ 
nies game where players can predict each other accu¬ 
rately. The fact that player A plays “heads” is ev¬ 
idence that player B will predict that player A will 
play “heads” and play “tails” in response, so player 
A would then have preferred to play “tails”, and so 
the “heads” action would fail to self-ratify. However, 
the mixed strategy of flipping the coin would self-ratify. 
Our reflection principle encodes some global constraints 
on players’ mixed strategies that are similar to self- 
ratihcation. 

The question of how to model agents as an ordinary 
part of the environment is of interest in the speculative 
study of human-level and smarter-than-human artifi¬ 
cial intelligence [13, JTj. Although such systems are 
still firmly in the domain of futurism, there has been a 
recent wave of interest in foundational research aimed 
at understanding their behavior, in order to ensure that 
they will behave as intended if and when they are de¬ 
veloped [15, 16 , 11] . 

Theoretical models of smarter-than-human intel¬ 
ligence such as Hutter’s universally intelligent agent 
AIXI [17] typically treat the agent as separate from the 
environment, communicating only through well-defined 
input and output channels. In the real world, agents 
run on hardware that is part of the environment, and 
Orseau and Ring [13 ] have proposed formalisms for 
studying space-time embedded intelligence running on 
hardware that is embedded in its environment. Our 
formalism might be useful for studying idealized mod¬ 
els of agents embedded in their environment: While 
real agents must be boundedly rational, the ability to 
study perfectly Bayesian space-time embedded intelli¬ 
gence might help to clarify which aspects of realistic 
systems are due to bounded rationality, and which are 
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due to the fact that real agents aren’t cleanly separated 
from their environment. 

7 Conclusions and Future Work 

In this paper, we have introduced reflective oracles, a 
type of probabilistic oracle which is able to answer ques¬ 
tions about the behavior of oracle machines with ac¬ 
cess to the same oracle. We’ve shown that such oracle 
machines can implement a version of causal decision 
theory, and used this to establish a close relationship 
between reflective oracles and Nash equilibria. 

We have focused on answering queries about oracle 
machines that halt with probability 1, but the reflec¬ 
tion principle presented in Section 2 can be modified to 
apply to machines that do not necessarily halt. To do 
so, we replace the condition 

P(M°() = 1) < p ^ ¥{O{M,p)=0) = 1 
by the condition 

P(M'^()^0) < p ^ P(O(M,p)=0) = 1 

This is identical to the former principle if M'^ () is guar¬ 
anteed to halt, but provides sensible information even if 
there is a chance that M^{) loops. Appendix B proves 
the existence of reflective oracles satisfying this stronger 
reflection principle. 

The ability to deal with non-halting machines opens 
up the possibility of applying reflective oracles to sim¬ 
plicity priors such as Solomonoff induction [18 ]. which 
defines a probability distribution over infinite bit se¬ 
quences by, roughly, choosing a random program and 
running it. Solomonoff induction deals with com¬ 
putable hypotheses, but is itself uncomputable (albeit 
computably approximable) because it must deal with 
the possibility that a randomly chosen program may go 
into an infinite loop after writing only a finite number 
of bits on its output tape. A reflective oracle version of 
Solomonoff induction would be able to deal with a hy¬ 
pothesis space consisting of arbitrary oracle machines, 
while itself being implementable as an oracle machine; 
this would make it possible to model a predictor which 
predicts an environment it is itself embedded in. We 
leave details to future work. 
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APPENDIX 

A Nash Equilibria in a Variant of 
Matching Pennies 

Lemma A.l. Consider an n-player game with three 
distinguished players, each of which has two pure strate¬ 
gies: Player Row has strategies Up and Down, player 
Column has strategies Left and Right, and player Ma¬ 
trix has strategies Front and Back. Suppose that the 
payoffs of (Row, Column) depend only on the strategies 
of these three players, as follows: 


where the first matrix indicates the payoffs when Matrix 
plays Front, and the second matrix indicates the payoffs 
when Matrix plays Back. 

Write p for the probability that Row plays Down, 
and q for the probability that Matrix plays Back. At 
Nash equilibrium, we have p = q. 

Proof. • Case 1: 0 < g < 1. 

Suppose that there is a Nash equilibrium where 
Column plays Left. Then Row would play Up, but 
then Column would strictly prefer Right, which is 
a contradiction. 

Suppose that there is a Nash equilibrium where 
Column plays Right. Then Row would play 
Down, but then Column would strictly prefer 
Left, which is a contradiction. 

Thus, at every Nash equilibrium, Column must 
mix between strategies. Hence, at equilibrium. 
Column must be indifferent between Left and 
Right. This is equivalent to p(l — g) = (1 — p)q. 
This implies p > 0, since otherwise we’d have 
0(1 — g) = (1 — 0)g, i.e. 0 = g, but we assumed 
0 < g < 1. Thus, we can divide the equation by 
pq, yielding: 

(1 - q)/q = (1 -p)/p 
1/x — 1 = 1/p — 1 
1/g = 1/p 
<Aq=p 

• Case 2: g = 0. 

This gives us the following payoff matrix: 


(1,0) 

(0,0) 

TMT 

IW 


Suppose that there is a Nash equilibrium with p > 
0. Then at this equilibrium. Column must play 
Left; but if Column plays Left, then Row strictly 
prefers Up, which contradicts p > 0. Hence, we 
must have p = 0 = g. 

• Case 3: g = 1. 

This gives us the following payoff matrix: 


(1,0) 

(0,1) 

7^ 

TMT 


Suppose that there is a Nash equilibrium with 
p < 1. Then at this equilibrium. Column must 
play Right; but if Column plays Right, then Row 
strictly prefers Down, which contradicts p < 1. 
Hence, we must have p = 1 = g. 

□ 

B Proof of the Existence Theorem 

In this appendix, we prove Theorem 2.1(ii). Thus, 
suppose that i? is a set of queries and O is some or¬ 
acle; we want to show the existence of an oracle O' 
which is reflective on R and satisfies P(0'(M,p) = 1) = 
P(0(M,p) = 1) for all (M,p) ^ R. 

We will describe the behavior of O' by a pair of 
functions, query : At x ([0,1] fl Q) —> [0,1] and eval : 
Ad —>■ [0,1]. The first of these gives the distribution 
of O', i.e., query(M,p) = P(0'(M,p) = 1). The second 
gives the distribution of a machine’s behavior under O' : 
If M almost surely returns either 0 or 1, then eval(Af) = 
P(M<^'() = 1). 

Function pairs (query, eval) can be seen as elements 
of A := [0, l]-Mx([o.i]nQ) ^ [q, l]-^, which is a convex and 
compact subset of the locally convex topological vector 
space ]^Xx([o,i]nQ) ^ (with the product topology). 
We now define a correspondence / : A —>■ Pow(A), such 
that fixed points (query, eval) G /(query, eval) yield or¬ 
acles O' of the desired form. 

We define / by giving a set of necessary and suffi¬ 
cient conditions for (query',evaf) G /(query,eval). We 
place three conditions on query'(M,p): If (M,p) G R 
and eval(M) > p, then query'(M,p) = 1; if (M,p) G R 
and eval(M) < p, then query'(M,p) = 0; and if 
{M,p) ^ R, then query'(M,p) = P(0(M,p) = 1). 

To describe the conditions on eval/M), we will con¬ 
sider the definition of “probabilistic oracle machine” to 
include the initial state of the machine’s working tapes, 
so that we can view the state of a machine M'^Q af¬ 
ter one step of computation as a new machine N^Q. 
Then, any machine M can be classified as performing 
one of the following operations as its first step of com¬ 
putation: (i) a deterministic computation step, yielding 
a new state N, in which case eval'(M) = eval(A); (ii) a 
coin flip, yielding a state N with a rational probabil¬ 
ity p and another state N' with probability 1 — p, in 
which case eval'(M) = p • eval(V) -|- (1 — p) • eval(iV'); 


(1,0) 

(0,0) 

TMT 

TToT 


(1,0) 

(0,1) 

77^ 

7W 
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(iii) halting, with the output tape containing 0 (in which 
case eval^(M) = 0) or 1 (in which case eval^(M) = 1) 
or some other output (in which case eval^(M) is ar¬ 
bitrary); or (iv) an invocation of the oracle on a pair 
yielding a new state N if the oracle returns 0 
and a different new state N' if it returns 1. In the 
last case, writing q := query(M',p), the condition is 
eval'(M) = (1 — q) • eval(A^) -I- q ■ eval(fV'). 

Given a fixed point (query, eval) G /(query, eval), 
define O' by P(0'(M,p) = 1) = query(M,p). Then, it 
can be shown by induction that for every T G N and 
every M G A4, eval(M) is > the probability that () 
returns 1 after at most T timesteps, and < the proba¬ 
bility that it returns something other than 0 within this 
time bound; in the limit, we obtain 

P(M°'() = 1) < eval(M) < P(M°'() 7^ 0) 

Together with the conditions on query(M,p), this shows 
that 


P(M‘^'() = l)>p => F{0'{M,p) = 1) = 1 
P(M°'() = 0) > (1 -p) =» P(0'(M,p) = 0) = 1 

which is a strengthening of the conditions of Section 2: 
it is equivalent in the case where M'^ () halts with prob¬ 
ability 1, but provides information even if () may 
fail to halt. 

It remains to be shown that /(•) has a fixed point. 
To do so, we employ the infinite-dimensional general¬ 
ization of Kakutani’s fixed-point theorem [19]. 

It is clear from the definition that /(query, eval) is 
non-empty, closed and convex for all (query, eval) G A. 
Hence, to show that / has a fixed point, it is sufficient 
to show that it has closed graph. 

Thus, assume that we have sequences 
(query„, eval„) —?> (query, eval) and 

(query(j, evalp —>• (query', eva/), such that 

(query(j, eval„) G /(query„, eval„) for every n; we need 
to show that then, (query', eva/) G /(query, eval). 

For the conditions on eval', we can simply take the 
limit n —^ 00 on both sides of each equation. The condi¬ 
tion on query'(M,p) for (M,p) ^ R is clearly fulfilled, 
since query(j(M,p) is constant in this case. The two 
remaining conditions on query'(M,p) are entirely sym¬ 
metrical; without loss of generality, consider the case 
eval(M) > p, (M,p) G R. 

In this case, since (query„,eval„) —)• (query,eval) 
and convergence is pointwise, there must be an 
riQ such that eval„(M) > p for all n > np. 
Since (query]^,eval(j) G /(query„,eval„), it follows 
that query(j(M,p) = 1 for all n > no, whence 
query'(M,p) = I as desired. This completes the proof. 


